-> Decomposed time series in an effort to replicate Serena’s approach.

-> Fit linear regression to decomposed trend in attempt to replicate Serena’s approach. Interesting because it was similar, but not the same.

## 
## Call:
## lm(formula = Underlying.Windstorms ~ Month, data = Underlying.Wind)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29786 -0.35209 -0.02924  0.29417  1.67280 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.9958362  0.0729440  68.489  < 2e-16 ***
## Month       -0.0019724  0.0005248  -3.759 0.000215 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5633 on 238 degrees of freedom
## Multiple R-squared:  0.05603,    Adjusted R-squared:  0.05206 
## F-statistic: 14.13 on 1 and 238 DF,  p-value: 0.0002151

## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## `geom_smooth()` using formula 'y ~ x'

-> The data not randomly distributed.
-> When a linear regression model is suitable for a data set, then the residuals are more or less randomly distributed around the 0 line.
-> Linear regression might not be suitable.
-> Data has increasing non-constant variance.
-> Transformation needed. (log, sqrt, etc.)
-> No outliers observed.
-> Basic linear regression using Month number as predictor.

## 
## Call:
## lm(formula = Windstorms ~ Month, data = Wind)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.751 -1.571 -0.075  1.380  6.169 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.232846   0.282569  18.519   <2e-16 ***
## Month       -0.003190   0.001936  -1.647    0.101    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.236 on 250 degrees of freedom
## Multiple R-squared:  0.01074,    Adjusted R-squared:  0.006779 
## F-statistic: 2.713 on 1 and 250 DF,  p-value: 0.1008

-> Same comments as above.
-> Using lags and step wise reduced model.

## 
## Call:
## lm(formula = y ~ ., data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.802 -1.290 -0.007  1.011  4.787 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  5.303251   1.806995   2.935  0.00377 **
## Lag1         0.103447   0.071489   1.447  0.14963   
## Lag2         0.199806   0.071967   2.776  0.00608 **
## Lag3        -0.008333   0.073545  -0.113  0.90991   
## Lag4        -0.025409   0.074331  -0.342  0.73288   
## Lag5         0.069941   0.074600   0.938  0.34974   
## Lag6        -0.024152   0.074674  -0.323  0.74675   
## Lag7        -0.069821   0.073514  -0.950  0.34351   
## Lag8        -0.117487   0.073764  -1.593  0.11298   
## Lag9         0.033978   0.074336   0.457  0.64816   
## Lag10        0.096567   0.073381   1.316  0.18987   
## Lag11       -0.030853   0.073493  -0.420  0.67513   
## Lag12       -0.199938   0.072247  -2.767  0.00624 **
## Lag13       -0.052258   0.071358  -0.732  0.46493   
## Lag14       -0.003627   0.071620  -0.051  0.95967   
## Lag15        0.127517   0.072078   1.769  0.07857 . 
## Lag16       -0.015484   0.071380  -0.217  0.82851   
## Lag17       -0.027724   0.070675  -0.392  0.69532   
## Lag18        0.021696   0.069917   0.310  0.75669   
## Lag19       -0.126085   0.070002  -1.801  0.07336 . 
## Lag20        0.031782   0.070812   0.449  0.65410   
## Lag21       -0.005362   0.070998  -0.076  0.93988   
## Lag22       -0.076261   0.070911  -1.075  0.28362   
## Lag23        0.007255   0.070148   0.103  0.91774   
## Lag24       -0.128366   0.069457  -1.848  0.06623 . 
## Lag25        0.093584   0.069472   1.347  0.17966   
## Lag26        0.068664   0.069930   0.982  0.32748   
## Lag27        0.092229   0.069906   1.319  0.18874   
## Lag28        0.123940   0.070584   1.756  0.08081 . 
## Lag29       -0.042756   0.069867  -0.612  0.54134   
## Lag30       -0.146594   0.070340  -2.084  0.03857 * 
## Lag31       -0.185397   0.070938  -2.614  0.00972 **
## Lag32        0.017330   0.071628   0.242  0.80910   
## Lag33        0.001821   0.071590   0.025  0.97974   
## Lag34       -0.104696   0.071614  -1.462  0.14551   
## Lag35        0.167938   0.071539   2.347  0.01999 * 
## Lag36        0.005321   0.071072   0.075  0.94040   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.874 on 179 degrees of freedom
## Multiple R-squared:  0.4215, Adjusted R-squared:  0.3051 
## F-statistic: 3.623 on 36 and 179 DF,  p-value: 6.343e-09

## [1] 1.705796
## 
## Call:
## lm(formula = y ~ Lag1 + Lag2 + Lag8 + Lag10 + Lag12 + Lag15 + 
##     Lag19 + Lag24 + Lag25 + Lag28 + Lag30 + Lag31 + Lag34 + Lag35, 
##     data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1008 -1.2866  0.0711  1.0925  4.8070 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.05235    1.09800   4.601 7.42e-06 ***
## Lag1         0.10826    0.06235   1.736 0.084046 .  
## Lag2         0.23490    0.06141   3.825 0.000174 ***
## Lag8        -0.11370    0.06304  -1.803 0.072810 .  
## Lag10        0.09033    0.06345   1.424 0.156080    
## Lag12       -0.21746    0.06331  -3.435 0.000720 ***
## Lag15        0.16968    0.06262   2.710 0.007318 ** 
## Lag19       -0.17127    0.06098  -2.809 0.005465 ** 
## Lag24       -0.12018    0.06055  -1.985 0.048528 *  
## Lag25        0.09612    0.06013   1.599 0.111482    
## Lag28        0.10275    0.05923   1.735 0.084326 .  
## Lag30       -0.11305    0.05876  -1.924 0.055791 .  
## Lag31       -0.16941    0.06100  -2.777 0.006003 ** 
## Lag34       -0.11002    0.06118  -1.798 0.073631 .  
## Lag35        0.13733    0.06197   2.216 0.027809 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.807 on 201 degrees of freedom
## Multiple R-squared:  0.3956, Adjusted R-squared:  0.3535 
## F-statistic: 9.397 on 14 and 201 DF,  p-value: 7.787e-16

## [1] 1.74355

-> Improved randomness in data distribution.
-> Variance is improved and seems relatively constant.
-> Linearity is preserved.
-> No outliers observed.
-> Overall trend preserved.

## 
## Call:
## lm(formula = y ~ ., data = df.train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2057 -1.1163 -0.0251  1.0474  4.3263 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  6.1652666  1.8937659   3.256  0.00143 **
## Lag1         0.1651594  0.0801662   2.060  0.04128 * 
## Lag2         0.1712802  0.0798110   2.146  0.03364 * 
## Lag3         0.0054553  0.0801964   0.068  0.94587   
## Lag4         0.0197786  0.0796852   0.248  0.80435   
## Lag5         0.0440437  0.0792582   0.556  0.57933   
## Lag6         0.0156823  0.0790371   0.198  0.84302   
## Lag7        -0.0864357  0.0774582  -1.116  0.26643   
## Lag8        -0.1864329  0.0772313  -2.414  0.01711 * 
## Lag9         0.0175569  0.0782057   0.224  0.82271   
## Lag10        0.1493430  0.0777118   1.922  0.05673 . 
## Lag11       -0.0077403  0.0781662  -0.099  0.92127   
## Lag12       -0.1923934  0.0766485  -2.510  0.01324 * 
## Lag13       -0.0955724  0.0756511  -1.263  0.20863   
## Lag14       -0.0784204  0.0753650  -1.041  0.29993   
## Lag15        0.1617555  0.0750741   2.155  0.03295 * 
## Lag16       -0.0557097  0.0747308  -0.745  0.45727   
## Lag17       -0.0571948  0.0737398  -0.776  0.43931   
## Lag18        0.0444990  0.0733280   0.607  0.54496   
## Lag19       -0.0944010  0.0729117  -1.295  0.19761   
## Lag20       -0.0004723  0.0737548  -0.006  0.99490   
## Lag21       -0.0552491  0.0740898  -0.746  0.45713   
## Lag22       -0.0752446  0.0739539  -1.017  0.31074   
## Lag23        0.0382961  0.0736523   0.520  0.60394   
## Lag24       -0.1153656  0.0725775  -1.590  0.11426   
## Lag25        0.0787099  0.0723410   1.088  0.27850   
## Lag26        0.1006884  0.0731419   1.377  0.17089   
## Lag27        0.0127424  0.0732255   0.174  0.86211   
## Lag28        0.1631044  0.0737925   2.210  0.02876 * 
## Lag29       -0.0279901  0.0730933  -0.383  0.70236   
## Lag30       -0.1217811  0.0739390  -1.647  0.10186   
## Lag31       -0.1904145  0.0747048  -2.549  0.01192 * 
## Lag32        0.0125182  0.0748956   0.167  0.86751   
## Lag33       -0.0050747  0.0753990  -0.067  0.94644   
## Lag34       -0.1277886  0.0770638  -1.658  0.09958 . 
## Lag35        0.1403681  0.0770323   1.822  0.07062 . 
## Lag36       -0.0676444  0.0765462  -0.884  0.37841   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.813 on 136 degrees of freedom
## Multiple R-squared:  0.4811, Adjusted R-squared:  0.3438 
## F-statistic: 3.503 on 36 and 136 DF,  p-value: 7.205e-08

## [1] 2.261222
## 
## Call:
## lm(formula = y ~ Lag1 + Lag2 + Lag8 + Lag10 + Lag12 + Lag15 + 
##     Lag19 + Lag24 + Lag26 + Lag28 + Lag30 + Lag31 + Lag34 + Lag35, 
##     data = df.train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9073 -1.2893  0.0145  1.0613  4.3768 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.61850    1.17526   3.930 0.000127 ***
## Lag1         0.14579    0.06943   2.100 0.037323 *  
## Lag2         0.19075    0.06683   2.854 0.004892 ** 
## Lag8        -0.16470    0.06607  -2.493 0.013704 *  
## Lag10        0.12824    0.06743   1.902 0.059001 .  
## Lag12       -0.20981    0.06668  -3.147 0.001975 ** 
## Lag15        0.17186    0.06526   2.633 0.009296 ** 
## Lag19       -0.10863    0.06392  -1.700 0.091186 .  
## Lag24       -0.12989    0.06293  -2.064 0.040638 *  
## Lag26        0.09544    0.06356   1.502 0.135176    
## Lag28        0.16378    0.06129   2.672 0.008329 ** 
## Lag30       -0.09574    0.06128  -1.562 0.120222    
## Lag31       -0.18578    0.06424  -2.892 0.004367 ** 
## Lag34       -0.13432    0.06504  -2.065 0.040558 *  
## Lag35        0.15682    0.06587   2.381 0.018469 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.736 on 158 degrees of freedom
## Multiple R-squared:  0.4473, Adjusted R-squared:  0.3983 
## F-statistic: 9.133 on 14 and 158 DF,  p-value: 1.749e-14

## [1] 2.145673

-> Same as previous comments.

## 
## Call:
## lm(formula = Windstorms ~ Month, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6885 -1.6722 -0.0993  1.3928  6.2098 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.302632   0.308831  17.170   <2e-16 ***
## Month       -0.004067   0.002550  -1.595    0.112    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.224 on 207 degrees of freedom
## Multiple R-squared:  0.01214,    Adjusted R-squared:  0.007365 
## F-statistic: 2.543 on 1 and 207 DF,  p-value: 0.1123

## [1] 2.299485

-> Further improved randomness in training data distribution.
-> Variance is further improved and seems relatively constant.
-> Linearity is still preserved.
-> No outliers observed in training data.

## 
## Call:
## glm(formula = y ~ ., family = "poisson", data = df.train)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.96305  -0.61141   0.02118   0.46553   2.01335  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.8243562  0.4859019   3.755 0.000174 ***
## Lag1         0.0319557  0.0201206   1.588 0.112239    
## Lag2         0.0343060  0.0204484   1.678 0.093408 .  
## Lag3        -0.0006835  0.0206323  -0.033 0.973572    
## Lag4         0.0018537  0.0195971   0.095 0.924641    
## Lag5         0.0079283  0.0198476   0.399 0.689554    
## Lag6         0.0044299  0.0202349   0.219 0.826711    
## Lag7        -0.0129419  0.0199265  -0.649 0.516025    
## Lag8        -0.0403506  0.0202402  -1.994 0.046197 *  
## Lag9         0.0011141  0.0197335   0.056 0.954976    
## Lag10        0.0314161  0.0203940   1.540 0.123448    
## Lag11        0.0028507  0.0200140   0.142 0.886737    
## Lag12       -0.0441263  0.0195849  -2.253 0.024254 *  
## Lag13       -0.0183676  0.0196366  -0.935 0.349593    
## Lag14       -0.0136586  0.0194065  -0.704 0.481546    
## Lag15        0.0320105  0.0186735   1.714 0.086489 .  
## Lag16       -0.0085467  0.0187153  -0.457 0.647910    
## Lag17       -0.0100425  0.0184779  -0.543 0.586794    
## Lag18        0.0066532  0.0183591   0.362 0.717059    
## Lag19       -0.0224879  0.0186172  -1.208 0.227082    
## Lag20        0.0016922  0.0191991   0.088 0.929764    
## Lag21       -0.0080811  0.0190933  -0.423 0.672120    
## Lag22       -0.0148979  0.0191178  -0.779 0.435821    
## Lag23        0.0077668  0.0185850   0.418 0.676015    
## Lag24       -0.0334716  0.0191717  -1.746 0.080830 .  
## Lag25        0.0207022  0.0185789   1.114 0.265157    
## Lag26        0.0218580  0.0188488   1.160 0.246190    
## Lag27        0.0062525  0.0184735   0.338 0.735019    
## Lag28        0.0354825  0.0186218   1.905 0.056724 .  
## Lag29       -0.0053556  0.0188092  -0.285 0.775848    
## Lag30       -0.0250875  0.0194278  -1.291 0.196594    
## Lag31       -0.0458637  0.0196851  -2.330 0.019813 *  
## Lag32       -0.0034065  0.0194956  -0.175 0.861293    
## Lag33       -0.0018370  0.0195285  -0.094 0.925057    
## Lag34       -0.0244465  0.0204344  -1.196 0.231564    
## Lag35        0.0289662  0.0198620   1.458 0.144738    
## Lag36       -0.0116835  0.0201027  -0.581 0.561111    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 192.19  on 172  degrees of freedom
## Residual deviance: 103.92  on 136  degrees of freedom
## AIC: 749.12
## 
## Number of Fisher Scoring iterations: 4

## [1] 2.191144
## 
## Call:
## glm(formula = y ~ Lag1 + Lag2 + Lag8 + Lag10 + Lag12 + Lag15 + 
##     Lag19 + Lag24 + Lag28 + Lag31 + Lag34 + Lag35, family = "poisson", 
##     data = df.train)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.05023  -0.60999   0.03858   0.47768   1.95468  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.50575    0.29010   5.190  2.1e-07 ***
## Lag1         0.03945    0.01749   2.255   0.0241 *  
## Lag2         0.04375    0.01732   2.526   0.0115 *  
## Lag8        -0.03673    0.01771  -2.074   0.0380 *  
## Lag10        0.03114    0.01788   1.742   0.0815 .  
## Lag12       -0.04858    0.01765  -2.753   0.0059 ** 
## Lag15        0.03967    0.01683   2.357   0.0184 *  
## Lag19       -0.03177    0.01667  -1.906   0.0567 .  
## Lag24       -0.02982    0.01665  -1.791   0.0732 .  
## Lag28        0.03511    0.01608   2.183   0.0290 *  
## Lag31       -0.04409    0.01748  -2.522   0.0117 *  
## Lag34       -0.02991    0.01725  -1.734   0.0829 .  
## Lag35        0.03139    0.01705   1.841   0.0657 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 192.19  on 172  degrees of freedom
## Residual deviance: 112.13  on 160  degrees of freedom
## AIC: 709.33
## 
## Number of Fisher Scoring iterations: 4

## [1] 2.139466

-> No real observable improvement to previous progress.
-> lag12 might have significance with cyclical trend after every 12 counts of lag (i.e. Lag25, Lag49, etc.)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Series: trend 
## ARIMA(0,1,0)(1,0,1)[12] 
## 
## Coefficients:
##          sar1     sma1
##       -0.0533  -0.8644
## s.e.   0.0805   0.0594
## 
## sigma^2 = 0.01455:  log likelihood = 158.57
## AIC=-311.13   AICc=-311.03   BIC=-300.7

-> The output shows that the best fitting ARIMA model for the trend component of the time series has orders (0,1,0) for the non-seasonal part, and (1,0,1) for the seasonal part with a frequency of 12 (denoted as [12]).
-> The log likelihood of the model is 158.57, and the Akaike Information Criterion (AIC), corrected AIC (AICc), and Bayesian Information Criterion (BIC) values are -311.13, -311.03, and -300.7, respectively.
-> The ARIMA model suggests that the trend component of the time series is best represented as a random walk with drift (i.e., ARIMA(0,1,0)), and a seasonal moving average with order one (i.e., SARIMA(0,1,0)(1,0,1)[12]).
-> One potential takeaway from this analysis is that there is a seasonal pattern in the trend of the number of windstorms per month.
-> The non-seasonal part of the ARIMA model has an order of (0,1,0), which means that the first difference of the series (i.e., the difference between consecutive observations) is a random walk with drift. This suggests that the trend component of the time series is not stationary, but rather exhibits a long-term upward or downward trend. -> the seasonal part of the ARIMA model has an order of (1,0,1)[12], which indicates the presence of seasonality with a period of 12 months. Specifically, the model suggests that the trend component of the series has a moving average component that is influenced by the value of the series at the same time point in the previous year, as well as by a random shock.
-> the relatively small value of the estimated sigma-squared parameter (0.01455) indicates that the variance of the trend component of the series is relatively low, which could suggest that the series is relatively stable and predictable. However, caution should be exercised when interpreting this finding, as the ARIMA model assumes that the underlying data generating process is stationary, which may not be the case for the original time series.

## 
## Call:
## lm(formula = windTS$trend ~ index(windTS$trend))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29786 -0.35209 -0.02924  0.29417  1.67280 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         52.367429  12.667130   4.134 4.94e-05 ***
## index(windTS$trend) -0.023669   0.006297  -3.759 0.000215 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5633 on 238 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.05603,    Adjusted R-squared:  0.05206 
## F-statistic: 14.13 on 1 and 238 DF,  p-value: 0.0002151

-> The decomposition of the time series into its components showed that there is a seasonal pattern in the data, with the number of storms peaking in the winter months and decreasing in the summer months.
-> The trend component of the time series showed a slight decrease in the number of storms over time, although the relationship with time is not very strong.
-> The auto.arima function suggested an ARIMA(0,1,0)(1,0,1)[12] model for the trend component of the time series, which includes a seasonal component and a moving average component.
-> The diagnostic plots for the ARIMA model showed that the residuals were approximately normally distributed and had constant variance over time, suggesting that the model is a good fit for the data.
-> Based on the forecast from the ARIMA model, it is predicted that the number of storms will continue to decrease slightly over the next 12 months.
-> The main takeaway from this analysis is that while there is evidence of a slight downward trend in the number of windstorms over time, the relationship is not very strong and there is still considerable variability in the data. Therefore, it is important to continue monitoring the data and update the analysis as more data becomes available.

Based on the analysis so far, some recommendations for further analysis could include:

Incorporating other relevant variables: While the analysis so far has focused solely on the trend component of the time series, it may be useful to incorporate other variables that could impact the number of windstorms. For example, temperature, humidity, and pressure could all have an effect on the occurrence of windstorms.

Testing for seasonality: Although the decomposed time series indicated that there was no seasonality in the data, it may still be worthwhile to investigate this further. For example, it may be useful to test for seasonality using other statistical methods like Fourier analysis.

Exploring other time series models: While the auto.arima function is a useful tool for time series modeling, there are other models that could be explored, such as Vector Autoregression (VAR), GARCH, or state-space models.